Search CORE

23 research outputs found

Algorithms and Architectures for Network Search Processors

Author: Dharmapurikar Sarang
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2006
Field of study

The continuous growth in the Internet’s size, the amount of data traﬃc, and the complexity of processing this traﬃc gives rise to new challenges in building high-performance network devices. One of the most fundamental tasks performed by these devices is searching the network data for predeﬁned keys. Address lookup, packet classiﬁcation, and deep packet inspection are some of the operations which involve table lookups and searching. These operations are typically part of the packet forwarding mechanism, and can create a performance bottleneck. Therefore, fast and resource eﬃcient algorithms are required. One of the most commonly used techniques for such searching operations is the Ternary Content Addressable Memory (TCAM). While TCAM can oﬀer very fast search speeds, it is costly and consumes a large amount of power. Hence, designing cost-eﬀective, power-eﬃcient, and high-speed search techniques has received a great deal of attention in the research and industrial community. In this thesis, we propose a generic search technique based on Bloom ﬁlters. A Bloom ﬁlter is a randomized data structure used to represent a set of bit-strings compactly and support set membership queries. We demonstrate techniques to convert the search process into table lookups. The resulting table data structures are kept in the oﬀ-chip memory and their Bloom ﬁlter representations are kept in the on-chip memory. An item needs to be looked up in the oﬀ-chip table only when it is found in the on-chip Bloom ﬁlters. By ﬁltering the oﬀ-chip memory accesses in this fashion, the search operations can be signiﬁcantly accelerated. Our approach involves a unique combination of algorithmic and architectural techniques that outperform some of the current techniques in terms of cost-eﬀectiveness, speed, and power-eﬃciency

CiteSeerX

Washington University St. Louis: Open Scholarship

Synthesizable Design of a Multi-Module Memory Controller

Author: Dharmapurikar Sarang
Lockwood John W.
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2001
Field of study

Random Access Memory (RAM) is a common resources needed by networking hardware modules. Synchronous Dynamic RAM (SDRAM) provides a cost effective solution for such data storage. As the packet processing speeds in the hardware increase memory throughput can be a bottleneck to achieve overall high performance. Typically there are multiple hardware modules which perform different operations on the packet payload and hence all try to access the common packet buffer simultaneously. This gives rise to a need for a memory controller which arbitrates between the memory requests made by different modules and maximizes the memory throughput. This paper discusses the design and implementation of a SDRAM controller which satisfies both the requirements. The memory throughput depends on the burst lengths, the address pattern of the memory accesses and the type of memory access (read/write). Given the information about the current SDRAM access and the pending SDRAM access requests, the controller finds the memory access request among the pending requests which utilizes the data bus most efficiently and increases the throughput. This leads to the re-ordering of the memory requests between modules. Results show how this controller improves the overall throughput

Washington University St. Louis: Open Scholarship

Fast Packet Classification Using Bloom Filters

Author: Dharmapurikar Sarang
Lockwood John
Song Haoyu
Turner Jonathan
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2006
Field of study

While the problem of general packet classification has received a great deal of attention from researchers over the last ten years, there is still no really satisfactory solution. Ternary Content Addressable Memory (TCAM), although widely used in practice, is both expensive and consumes a lot of power. Algorithmic solutions, which rely on commodity memory chips, are relatively inexpensive and power-efficient, but have not been able to match the generality and performance of TCAMs. In this paper we propose a new approach to packet classification, which combines architectural and algorithmic techniques. Our starting point is the well-known crossproducting algorithm, which is fast but has significant memory overhead due to the extra rules needed to represent the crossproducts. We show how to modify the crossproduct method in a way that drastically reduces the memory required, without compromising on performance. We avoid unnecessary accesses to off-chip memory by filtering off-chip accesses using on-chip Bloom filters. For packets that match p rules in a rule set, our algorithm requires just 4 + p + ǫ independent memory accesses on average, to return all matching rules, where ǫ á 1 is a small constant that depends on the false positive rate of the Bloom filters. Each memory access is just 256 bits, making it practical to classify small packets at OC-192 link rates using two commodity SRAM chips. For rule set sizes ranging from a few hundred to several thousand filters, the average rule set expansion factor attributable to the algorithm is just 1.2. The memory consumption per rule is 36 bytes in the average case

Washington University St. Louis: Open Scholarship

Generalized RAD Module Interface Specification of the Field-programmable Port eXtender (FPX) Version 2.0

Author: Dharmapurikar Sarang
Lockwood John W.
Taylor David E.
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2001
Field of study

The Field-programmable Port eXtender (FPX) provides dynamic, fast, and flexible mechanisms to process data streams at the ports of the Washington University Gigabit Switch (WUGS-20). By performing all computations in FPGA hardware, cells and packets can be processed at the full line speed of the transmission interface, currently 2.4 Gbits/sec. In order to design and implement portable hardware modules for the Reprogrammable Application Devide (RAD) on the FPX board, all modules should conform to a standard interface. This standard interface specifies how modules receive and transmit ATM cells of data flows, prevent data loss during reconfiguration, and access off-chip memory. Module designers should conform to the standard I/O signal names and take special note of timing diagram references

Washington University St. Louis: Open Scholarship

System-on-Chip Packet Processor for an Experimental Network Services Platform

Author: Chandra Alex
Chen Yuhua
Dharmapurikar Sarang
Lockwood John
Tang Wenjing
Taylor David
Turner Jonathan
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2003
Field of study

As the focus of networking research shifts from raw performance to the delivery of advanced network services, there is a growing need for open-platform systems for extensible networking research. The Applied Research Laboratory at Washington University in Saint Louis has developed a ﬂexible Network Services Platform (NSP) to meet this need. The NSP provides an extensible platform for prototyping next-generation network services and applications. This paper describes the design of a system-on-chip Packet Processor for the NSP which performs all core packet processing functions including segmentation and reassembly, packet classiﬁcation, route lookup, and queue management. Targeted to a commercial conﬁgurable logic device, the system is designed to support gigabit links and switch fabrics with a 2:1 speed advantage. We provide resource consumption results for each component of the Packet Processor design

CiteSeerX

Washington University St. Louis: Open Scholarship

Fast and scalable pattern matching for content filtering

Author: John Lockwood
Sarang Dharmapurikar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

High-speed packet content inspection and filtering devices rely on a fast multi-pattern matching algorithm which is used to detect pre-defined keywords or signatures in the packets. Multi-pattern match-ing is known to require intensive memory accesses and is often a performance bottleneck. Hence specialized hardware-accelerated algorithms are being developed for line-speed packet processing. While several pattern matching algorithms have already been de-veloped for such applications, we find that most of them suffer from scalability issues. To support a large number of patterns, the throughput is compromised or vice versa. We present a hardware-implementable pattern matching algo-rithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length. We modify the classic Aho-Corasick algorithm to consider multiple characters at a time for higher throughput. Furthermore, we suppress a large fraction of memory accesses by using Bloom filters implemented with a small amount of on-chip memory. The resulting algorithm can support matching of several thousands of patterns at more than 10 Gbps with the help of a less than 50 KBytes of embedded mem-ory and a few megabytes of external SRAM. We demonstrate the merit of our algorithm through theoretical analysis and simulations performed on Snort’s string set

CiteSeerX

Crossref

Fast and scalable pattern matching for network intrusion detection systems

Author: John Lockwood
Member Ieee
Sarang Dharmapurikar
Publication venue
Publication date: 01/01/2006
Field of study

Abstract — High-speed packet content inspection and filtering devices rely on a fast multi-pattern matching algorithm which is used to detect predefined keywords or signatures in the packets. Multi-pattern matching is known to require intensive memory accesses and is often a performance bottleneck. Hence specialized hardware-accelerated algorithms are required for line-speed packet processing. We present hardware-implementable pattern matching algorithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length. Our algorithm is based on a memory efficient multi-hashing data structure called Bloom filter. We use embedded on-chip memory blocks in FPGA/VLSI chips to construct Bloom filters which can suppress a large fraction of memory accesses and speed up string matching. Based on this concept, we first present a simple algorithm which can scan for several thousand short (up to 16 bytes) patterns at multi-gigabit per second speeds with a moderately small amount of embedded memory and a few mega bytes of external memory. Furthermore, we modify this algorithm to be able to handle arbitrarily large strings at the cost of a little more on-chip memory. We demonstrate the merit of our algorithm through theoretical analysis and simulations performed on Snort’s string set

CiteSeerX

Design and implementation of a string matching system for network intrusion detection using FPGA-based bloom filters

Author: John Lockwood
Michael Attig
Sarang Dharmapurikar
Publication venue
Publication date: 01/01/2004
Field of study

Modern Network Intrusion Detection Systems (NIDS) inspect the network packet payload to check if it conforms to the security policies of the given network. This process, often referred to as deep packet inspection, involves detection of predefined signature strings or keywords starting at an arbitrary location in the payload. String matching is a computationally intensive task and can become a potential bottleneck without high-speed processing. Since the conventional software-implemented string matching algorithms have not kept pace with the increasing network speeds, special purpose hardware solutions have been introduced. In this paper we show how Bloom filters can be used effectively to perform string matching for thousands of strings at wire speed. We describe how Bloom filters can be implemented feasibly with FPGA. Technology analysis shows that this approach for string matching is more effective than the current FPGA-based string matching circuits which use Deterministic or Non-deterministic Finite Automata (DFA or NFA). Finally, we give the details of our implementation of string matching technique on Xilinx XCV 2000E FPGA.

CiteSeerX

Washington University St. Louis: Open Scholarship